Skip to content

fix: sanitize store addrs in logs#8078

Open
Detachm wants to merge 1 commit into
GreptimeTeam:mainfrom
Detachm:codex/sanitize-store-addrs-logs
Open

fix: sanitize store addrs in logs#8078
Detachm wants to merge 1 commit into
GreptimeTeam:mainfrom
Detachm:codex/sanitize-store-addrs-logs

Conversation

@Detachm
Copy link
Copy Markdown
Contributor

@Detachm Detachm commented May 7, 2026

Closes #7525

Sanitizes store_addrs before logging kvbackend construction and strengthens connection string sanitization for URL and key-value formats.

The fix keeps useful host/database information in logs while redacting credentials such as URL userinfo and password-like key-value fields.

Validation:

  • cargo test -p common-meta test_sanitize_connection_string --quiet
  • cargo test -p cli test_sanitize_store_addrs --quiet
  • cargo test -p cmd test_start_command_debug_sanitizes_store_addrs --quiet
  • cargo test -p meta-srv test_metasrv_options_debug_sanitizes_store_addrs --quiet
  • cargo fmt --all -- --check
  • git diff --check

@github-actions github-actions Bot added size/S docs-not-required This change does not impact docs. labels May 7, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the connection string sanitization logic to more robustly redact sensitive information like passwords from logs and debug outputs. It introduces the url crate for parsing URL-formatted strings and utilizes a regex for key-value pairs. The feedback suggests expanding the set of redacted keys to include 'token' and 'secret' for both URL query parameters and key-value connection strings.

Comment thread src/common/meta/src/kv_backend/util.rs Outdated
const REDACTED: &str = "***";

static SENSITIVE_KV_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r#"(?i)(^|\s)(password|pass|pwd)\s*=\s*('[^']*'|"[^"]*"|\S+)"#).unwrap()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The regex for sensitive key-value pairs could be more comprehensive. Common sensitive keys in connection strings also include token and secret. Adding these would improve the robustness of the sanitization.

Suggested change
Regex::new(r#"(?i)(^|\s)(password|pass|pwd)\s*=\s*('[^']*'|"[^"]*"|\S+)"#).unwrap()
Regex::new(r#"(?i)(^|\s)(password|pass|pwd|token|secret)\s*=\s*('[^']*'|"[^" ]*"|\S+)"#).unwrap()

Comment thread src/common/meta/src/kv_backend/util.rs Outdated
fn is_sensitive_key(key: &str) -> bool {
matches!(
key.to_ascii_lowercase().as_str(),
"password" | "pass" | "pwd"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Consistent with the regex improvement, the is_sensitive_key function should also check for token and secret keys in URL query parameters.

Suggested change
"password" | "pass" | "pwd"
"password" | "pass" | "pwd" | "token" | "secret"

Signed-off-by: Detachm <42765252+Detachm@users.noreply.github.com>
@Detachm Detachm force-pushed the codex/sanitize-store-addrs-logs branch from 8f92df3 to e56fc56 Compare May 7, 2026 12:19
Copy link
Copy Markdown
Contributor Author

Detachm commented May 7, 2026

Updated the sanitizer to also redact token and secret in both URL query parameters and key-value connection strings, with test coverage added for both formats.

@Detachm Detachm marked this pull request as ready for review May 8, 2026 12:40
@Detachm Detachm requested review from a team, MichaelScofield and WenyXu as code owners May 8, 2026 12:40
@WenyXu WenyXu requested a review from Copilot May 8, 2026 12:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses sensitive credential leakage by sanitizing store_addrs before they are emitted in logs/debug output, improving redaction for both URL-style DSNs and key-value connection strings across metasrv/cmd/cli usage.

Changes:

  • Strengthened sanitize_connection_string to redact URL userinfo and sensitive query/key-value parameters (e.g., password, token, secret).
  • Updated CLI kvbackend construction log to print sanitized store_addrs.
  • Added targeted tests ensuring Debug/log output does not contain raw secrets for StartCommand and MetasrvOptions, plus expanded sanitizer unit tests.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/meta-srv/src/metasrv.rs Adds a test ensuring MetasrvOptions debug output sanitizes store_addrs.
src/common/meta/src/kv_backend/util.rs Reworks connection-string sanitization using url parsing + regex redaction; expands unit tests.
src/common/meta/Cargo.toml Adds url dependency to support URL parsing in the sanitizer.
src/cmd/src/metasrv.rs Adds a test ensuring StartCommand debug output sanitizes store_addrs.
src/cli/src/common/store.rs Sanitizes store_addrs in kvbackend construction log; adds unit test for store addr sanitization.
Cargo.lock Locks the added url dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@MichaelScofield MichaelScofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the CI.

Comment on lines +44 to +56
let pairs = url
.query_pairs()
.map(|(key, value)| {
let value = if is_sensitive_key(&key) {
REDACTED.into()
} else {
value
};
(key.into_owned(), value.into_owned())
})
.collect::<Vec<_>>();

url.query_pairs_mut().clear().extend_pairs(pairs);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to redact in place? Like this:

Suggested change
let pairs = url
.query_pairs()
.map(|(key, value)| {
let value = if is_sensitive_key(&key) {
REDACTED.into()
} else {
value
};
(key.into_owned(), value.into_owned())
})
.collect::<Vec<_>>();
url.query_pairs_mut().clear().extend_pairs(pairs);
url
.query_pairs_mut()
.for_each_mut(|(key, value)| {
if is_sensitive_key(&key) {
*value = REDACTED.into()
}
})

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I tried the in-place redaction approach, but url::Url::query_pairs_mut() returns a serializer and does not expose a mutable iterator over existing query pairs, so for_each_mut is not available in the current url API.

The current implementation keeps using url to parse the query pairs, redacts sensitive values, then clears and writes the pairs back via extend_pairs. This avoids hand-rolling query-string parsing while preserving the same redaction behavior.

I also checked the visible failing checks. They seem to come from an older cancelled workflow run, while the latest CI run for the current head commit has passed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good. cc @MichaelScofield

@killme2008
Copy link
Copy Markdown
Member

@Detachm The CI needs to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove sensitive information from store_addrs in logs

4 participants